Transforming Strings to Vector Spaces Using Prototype Selection

نویسندگان

  • Barbara Spillmann
  • Michel Neuhaus
  • Horst Bunke
  • Elzbieta Pekalska
  • Robert P. W. Duin
چکیده

A common way of expressing string similarity in structural pattern recognition is the edit distance. It allows one to apply the kNN rule in order to classify a set of strings. However, compared to the wide range of elaborated classifiers known from statistical pattern recognition, this is only a very basic method. In the present paper we propose a method for transforming strings into n-dimensional real vector spaces based on prototype selection. This allows us to subsequently classify the transformed strings with more sophisticated classifiers, such as support vector machine and other kernel based methods. In a number of experiments, we show that the recognition rate can be significantly improved by means of this procedure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transforming Pixel Signatures into an Improved Metric Space

Our previous work in computer-aided mammography has used scale orientation pixel signatures to provide a rich description of local structure. However, when treated as vectors for statistical classi cation, the Euclidean space they de ne has unsatisfactory metric properties. In this paper we describe a scheme that makes use of a previously described signature similarity measure to de ne a non-li...

متن کامل

Kernel Dependency Estimation

We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. We exp...

متن کامل

Discriminative prototype selection methods for graph embedding

Graphs possess a strong representational power for many types of patterns. However, a main limitation in their use for pattern analysis derives from their difficult mathematical treatment. One way of circumventing this problem is that of transforming the graphs into a vector space by means of graph embedding. Such an embedding can be conveniently obtained by using a set of ‘‘prototype’’ graphs ...

متن کامل

Prototype Selection for Interpretable Classification By

Prototype methods seek a minimal subset of samples that can serve as a distillation or condensed view of a data set. As the size of modern data sets grows, being able to present a domain specialist with a short list of “representative” samples chosen from the data set is of increasing interpretative value. While much recent statistical research has been focused on producing sparse-in-the-variab...

متن کامل

Prototype Selection for Classification in Standard and Generalized Dissimilarity Spaces

A common way to represent patterns for recognition systems is by feature vectors lying in some space. If this representation is based only on the predefined object features, it is independent of the other objects. In contrast, a dissimilarity representation of objects takes into account the relations between them by some measure of resemblance (e.g. dissimilarity). The nearest neighbour (1-NN) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006